Analyzing Relatedness by Toponym Co-Occurrences on Web Pages
نویسندگان
چکیده
This research proposes a method for capturing “relatedness between geographical entities” based on the co-occurrences of their names on web pages. The basic assumption is that a higher count of co-occurrences of two geographical places implies a stronger relatedness between them. The spatial structure of China at the provincial level is explored from the co-occurrences of two provincial units in one document, extracted by a web information retrieval engine. Analysis on the co-occurrences and topological distances between all pairs of provinces indicates that: (1) spatially close provinces generally have similar co-occurrence patterns; (2) the frequency of co-occurrences exhibits a power law distance decay effect with the exponent of 0.2; and (3) the co-occurrence matrix can be used to capture the similarity/ linkage between neighboring provinces and fed into a regionalization method to examine the spatial organization of China. The proposed method provides a promising approach to extracting valuable geographical information from massive web pages.
منابع مشابه
Tag Cloud Reorganization: Finding Groups of Related Tags on Delicious
Tag clouds have become an appealing way of navigating through web pages on social tagging systems. Recent research has focused on finding relations among tags to improve visualization and access to web documents from tag clouds. Reorganizing tag clouds according to tag relatedness has been suggested as an effective solution to ease navigation. Most of the approaches either rely on co-occurrence...
متن کاملAssigning and Visualizing Music Genres by Web-based Co-Occurrence Analysis
Abstract We explore a simple, web-based method for predicting the genre of a given artist based on co-occurrence analysis, i.e. analyzing co-occurrences of artist and genre names on music-related web pages. To this end, we use the page counts provided by Google to estimate the relatedness of an arbitrary artist to each of a set of genres. We investigate four different query schemes for obtainin...
متن کاملAnalyzing new features of infected web content in detection of malicious web pages
Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...
متن کاملTagging Artists using Co-Occurrences on the Web
We present an efficient unsupervised approach in finding subjective artist meta-data on the world wide web. Since we are interested in the collective knowledge on artists as available on the web, our method is based on the extraction of information from multiple web pages. We use co-occurrences of pairs of artists on the web to identify similarity between artists. To determine the applicability...
متن کاملInvestigating Different Term Weighting Functions for Browsing Artist-Related Web Pages by Means of Term Co-Occurrences
We present a user interface (UI) for browsing collections of web pages about music artists. Given such a collection, we use a term list to index its contents and to derive term co-occurrences. Based on these co-occurrences, we create a UI that employs a variant of the Sunburst visualization technique. The UI is embedded in CoMIRVA, our framework for music information retrieval and visualization...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Trans. GIS
دوره 18 شماره
صفحات -
تاریخ انتشار 2014